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1. INTRODUCTION 

The increased using of internet led to internet congestion. The solution for this problem cannot 
depend only the congestion control mechanism provided by the source node only. The congestion control that 
is based on intermediate node includes two parts: managing and scheduling queues [1], [2]. Where the queue 
scheduling is used for network bandwidth issue while queue managing keeps the stability through choosing 
to neglect a certain packet based on the route [3], [4]. In tail drop algorithm, the router stores largest possible 
number of packets, and neglects those who can not be stored if the temporary storage is full [5], [6]. The 
random early detection (RED) algorithm, which is one of active queue management (AQM) types, the RED 
monitors the storage queue size and the drop based on statistical probabilities [7]. If the temporary storage 
was empty, all packets will be received with the possibility of dropping the packet, while if the temporary 
storage was full, all packets will be dropped [8]. RED is considered fairer than the tail drop, where RED does 
not object the data traffic that uses small B.W. As more packets are sent as more packets are dropped [9]. 

Reinforcement learning (RL) it can train how to tuning the inputs to the outputs. The RL requires 
certain states of environment, then it carries out the possible actions in particular states during the training 
[10]. RL starts to discover the actions and states inside this environment, then it uses the data it learned and 
gets the reward and continues learning until the reward [11], [12]. 

The trade-off between queuing delay and throughput is investigated in this study using an AQM 
(RED) integrated deep reinforcement learning framework for effective network control [13]. Deep Q- 
network (DQN) is used to create our application [14]. The key Q-network and the target network, for 
example, are both equipped with experience replay. It picks a packet drop or non-drop action at the packet 
departure point based on the current state, which includes dequeue rate, enqueue rate, drop rate, and avg 
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queue lengh. Following the selection of an event, a compensation is calculated based on a number of 
parameters that will be discussed. 

Bouacida and Shihada [15] introduced learn queue AQM algorithm in 2018, focused on wireless 
networking reinforcement learning. Through dynamically modifying a buffer size utilizing Q-learning in a 
specified period, they change the Q-table and refine the Q-function strategy, however check their method for 
just two and three scenarios deployed. Bisoy et al. [16] in 2017 proposed an AQM scheme focused on a shallow 
neural network with one secret layer consisting of three neurons to resolve the non-linearity of the networking 
framework and the queuing latency, but their research did not deal with the trade-off between throughput and 
delay performance. 

Reinforcement learning-queuing delay limitation (RL-QDL) AQM algorithm suggested in 2007 by 
Vucevic et al. [17]. RL agents provide topology details from the bandwidth broker that handles resource 
management and quality of service (QoS) provisioning based on what QoS requirements are met in egress 
routers (ERs). This supports class-based queuing (CBQ) by endorsing three separate classes: expedited 
forwarding (EF), guaranteed forwarding (AF), and best effort (BE) trac to provide end-to-end QoS to customers 
with specific service types. In 2018 with respect to network scheduling algorithms, Zhou et al. [18] suggested 
automated computation offloading strategy focused on deep reinforcement learning (DRL) by implementing a 
double DQN on the edge node. Comparing with standard algorithms, their solution implied the optimum 
tradeoff between task latency and drop. Xu et al. [19] applied DRL to network trace engineering in 2018 by 
implementing actor-critical approach with a replay of prioritized experiences. Authors contrasted their algorithm 
with the commonly used baseline solutions, such as shortest path (SP), load balance (LB), and network utility 
maximization (NUM), and checked that their model performs better than specified baseline solutions. 


2. METHOD 

The reinforced learning is achieved through the random interaction of the agent with the 
environment in sequential time steps (t=1, 2, 3...). At each time step, the agent tests an action out of set of 
actions At € A (s) that come from the state St € S. After testing the A (t) action is tested, the specialist gets a 
prize, and another state is assigned S (41). Through repeating this method (operation), each notion in the path 
will be suitable to express Markov decision process (MDP), as following: (s1, a1, r1), (S2, a2, r2... where Sn the 
state of network, Rn reward and an action [20]. Q-learning partner: state of agent now, action, and 
reward. 


2.1. Process of select action 

As for the territory of RL, we think about four components have been throught about: dequeue rate, 
enqueue rate, drop rate, and avrg_queue_len. At each time step t, state st is characterized as st={dequeue rate, 
enqueue rate, drop rate and avrg_queue_len} which is a contribution of multi-facet perceptron (MLP) 
comprising of three secret layers of 16-32-16 neurons for each layer. For choosing an activity, primary Q- 
network is utilized and it returns two probabilities as a result (drop/non-drop likelihood). To observe a 
superior activity on a specific state, use investigate/exploit methodology which implies that the specialist 
makes a move dependent on its own choice (exploit), or once in a while makes an irregular move consistently 
dependent on a specific likelihood (investigate). For the investigate/exploit system beginning from a 
profoundly arbitrary likelihood of activity for the investigate/take advantage of technique. The investigating 
likelihood is set at 90% dependent on the round of the scene at the primary scene of the organization 
reenactment, and it lessens to 0 percent through the scene. Figure 1 clarifies the choice interaction for an 
activity. The agent keeps trying until reaching the best reward [21]. 
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Figure 1. Process of selecting an action 
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2.2. Reward 

In the wake of making a move, the RL specialist sits tight for next state st+1 during the stretch Tint. 
The chosen activity is assessed by a prize capacity. The main purpose in planning the prize capacity is to 
enhance the compromise between lining postponement and drop-rate just as to keep away from limitless 
parcel drop state or non-drop state [22], [23]. 
2.3. Training process 

The agent will choose randomly in the beginning of the learning using (explore/exploit) feature. 
At each choice, the Reward ratio will be recorded and measure; and the Q is updated depending on the 


reward that it achieves as Figure 2 shows. The agent due so will interact with the environment and learn 
through accumulated rewards. 


Figure 2. Q-learning algorithm [24] 


The algorithm starts by using the action randomly (explore) and an initial state will be obtained. 
Then the second round begins, and for each round the reward is registered and the Q is updated through the 
equation above. Also, the new state is changed to the current state [25]. 


3. RESULTS AND DISCUSSION 

This section validates the validity and performance by NS3 simulation experiment of the designed 
DQN algorithm, the simulation uses the typical single-bottleneck network topology as shown in Figure 3, the 
network has n senders (S1~Sn), receivers (d1~dn), and 1 routers (n2). The bandwidth and delay between each 
sender (n1) and (n2) is 100 Mbps and 0.1 ms, and the bandwidth and delay between each receiver and (n2) is 
100 Mbps and 5 ms too. To compare, we analyzed the RED algorithm and DQN algorithm's queue length, 
throughput, delay and packet loss rate under changing of network link capacity, respectively. 
The performance of the algorithm, the simulation time is 100 seconds. Table below shows the queue length 
and standard deviation of the RED algorithm and the DQN algorithm. As can be seen from the table the 
average queue length of the RED algorithm is larger than that of the DQN algorithm. So the DQN algorithm 
reduces the drop probability, and reduces the delay. 
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Figure 3. Simulated network structure 
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3.1. Experiment 1 

For the proposed network's shown in Figure 3 the number of transmission control protocol (TCP) 
session (N) is 20. With the link capacity 0.5 Mbps, and data rate 1 Mbps. Decreasing the bottleneck link (C) 
leads to increase probability of dropping packets. It is appears by looking at mean, the standard deviation 
values, throughput and drop rate of algorithms in Table 1. As shown in Figure 4, notice an increase in your 
average queue length from DQN, RED Yes, there is a little advantage to the DQN algorithm due to the 
training process that took place on the algorithm. 


N=20 C=0.5Mbps Data rate = 1Mbps 


200 


150 


100 


Queue size (packets) 


50 


0 20 40 60 80 100 
Time (s) 


Figure 4. Queue length of the RED, DQN algorithms 


Table 1. The parameters of RED and DQN algorithms in experment 1 
Scenario Mean (packet) Standard deviation (packet) 
RED 115.747 44.37 
DQN 114.323 41.78 
scenario Mean (packet) Standard deviation (packet) 


Discuss: decrease the link capacity (C) leads to increase the round-trip time (RTT), and increase 
probability of dropping packets. That is appeared by looking at the standard deviation values of algorithms in 
Table 1. We can see that the DQN algorithm outperforms the RED technique in terms of total network 
performance. 


3.2. Experment 2 

For the proposed network's shown in figure 3 the number of TCP session (N) is 20. With the link 
capacity 1Mbps, and data rate 1 Mbps. The bottleneck was changed from 0.5 Mbps to 1 Mbps. This increase 
resulted in less crowding and less fall compared to experiment 1. It also shows the preference of the DQN 
algorithm over the RED algorithm in terms of drop show in Figure 5, notice a better deference of 1.1% by 
19.4% drop rate for the DQN algorithm and 20.3% drop rate for the RED algorithm can see in Table 2, as 
well as a better throughput of 0.1 Mbps, and this is due to the reason for training the algorithm and adapting 
it to the network congestions 


Table 2. The parameters of RED and DQN algorithms in experiment 2 
Scenario Mean (packet) Standard deviation (packet) 
RED 119.959 29.308 
DQN 117.727 26.355 
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Figure 5. Queue length of the RED, DQN algorithms 


Disscuss: when the link capacity (C) is reduced, the round-trip time (RTT) increases, as does the 
likelihood of packets being dropped. Looking at the standard deviation values of algorithms in table 2 reveals 
this. We can observe that the DQN algorithm produces a superior overall network performance than the RED 
approach. 


3.3. Experiment 3 

For the proposed network's shown in Figure 3, the number of TCP session (N) is 20. With the link 
capacity 5 Mbps, and data rate 1 Mbps. In Table 3, values of mean and standard deviation of the queue length 
with the increase in the size of the bottleneck, notice an excellent advantage in terms of network state, as in 
the Figure 6 notice the lowest drop rate, a good throughput, the average queue length, and a preference for 
the DQN algorithm where it obtained a drop rate of 1.2% less and a difference in the throughput of 0.04 
Mbps due to the good training of the algorithm can show in Table 3. Through experiments 2 and 1 note that 
the larger the bottleneck size, the lower the drop ratio, and the less congestions. 
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Figure 6. Queue length of the RED, DQN algorithms 


Table 3. The parameters of RED and DQN algorithms in experiment 7 
Scenario Mean (packet) Standard deviation (packet) 
RED 69.07 9.47 
DQN 67.93 6.78 
senario Mean (packet) Standard deviation (packet) 
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Discuss: we can see how the DQN method outperforms other algorithms in terms of keeping the 
queue length close to the target value with little oscillation. When looking at the value of standard deviation 
in table, it is apparent that this is the case. In all prior trials, the overshoot of the DQN response has never 
surpassed 160 packets; in contrast, the RED algorithm's overshoot has never exceeded 160 packets. 


3.4. Compares between RED and DQN with multiple parametes 

To study the effect of bottleneck link disturbance on the network, the following experiments have 
been executed, show in Table 4 for the proposed networks shown in Figure 3 the number of TCP session (N) 
is 20. With the different of bottleneck link, and data rate 1 Mbps. Values of mean of the queue length for 
AQM, throughput and drop rate, as shown in Table 4 show in Figures 7, 8, and 9. We can see the DQN 
algorithm overall network performance is better than that produced by the RED algorithm 


Table 4. Average of mean, throughput amd drop rate in different (c) 
C Mean packets RED Mean packets DQN Throughput RED ThroughputDQN Drop RED Drop DQN 


0.5 115.747 114.323 0.3796 0.3967 24.30% 22.90% 
1 119.959 117.727 0.8907 0.918 20.30% 19.40% 
5 69.07 67.93 4.935 4.972 15.40% 14.60% 
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Figure 8. Compare between throughputs in RED and DQN 
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Figure 9. Compare between drops in RED and DQN 
CONCLUSION 


Python was used to implement Q-learning algorithm to choose the highest parameter to reduce the 


packets, where an environment that has learned through network management was obtained, and this will 
allocate the highest probability to drop the packets which in turn will provide a better network efficiency with 
reducing or preventing the congestion. However, the generated results were better than those of RED. In the 
future work, the action choice should be improved using fuzzy-Q. 
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